EN FR
EN FR


Section: New Results

Statistical learning methodology and theory

Participants : Gilles Celeux, Christine Keribin, Erwan Le Pennec, Pascal Massart, Lucie Montuelle, Jean-Michel Poggi, Adrien Saumard, Solenne Thivin.

Unsupervised segmentation is an issue similar to unsupervised classification with an added spatial aspect. Functional data is acquired on points in a spatial domain and the goal is to segment the domain in homogeneous domain. The range of applications includes hyperspectral images in conservation sciences, fMRi data and all spatialized functional data. Erwan Le Pennec and Lucie Montuelle are focusing on the questions of the way to handle the spatial component from both the theoretical and the practical point of views. They study in particular the choice of the number of clusters. Furthermore, as functional data require heavy computation, they are required to propose numerically efficient algorithms. They have also extend the model to regression mixture.

Gilles Celeux, Christine Keribin and the Ph D. student Vincent Brault continue their work on the Latent Block Model (LBM). They compared several model selection criteria for binary tables [19] . However, the SEM-VEM Gibbs algorithm used to estimate LBM is subject to spurious solutions (empty clusters). To tackle this drawback, they have proposed to use Bayesian inference through Gibbs Sampling and studied the influence of the calibration of non informative prior distributions. They showed on numerical experiment the advantages of coupling Gibbs sampling with a Variational Bayes algorithm to get pointwise estimators [17] . Furthermore, they extended the previous studies from binary to categorical data [32] .

Christine Keribin has proposed to compare, on genomics applications, the use of LBM with other methodologies (variable selection procedure of Maugis and Martin Magniette, component analysis). She supervised an internship (Master 1) on the use of principal component analysis for gene expression data (Inria funding). This has been done on data of the SONATA project (leaded by URGV - Evry Genopole), in collaboration with Marie-Laure Martin-Magniette.

Erwan Le Pennec is supervising Solenne Thivin in her CIFRE with Michel Prenat and Thales Optronique. The aim is target detection on complex background such as clouds or sea. Their approach is a local test approach based on the test decision theory. A key issue is to learn good discrimant features and their probabilistic properties. So far, they have worked on cloud images given by Thales. They focus on a Markovian modeling of the clouds.

Considering the case of maximum likelihood density estimation on histograms, Adrien saumard has investigated both theory and methodology. On the one hand, he has shown that AIC is twice the minimal penalty in the sense of Birgé and Massart, which by consequence implies the asymptotic optimality of the slope heuristics based on a linear shape. On the other hand, he investigated the methodology of the small to moderate sample size setting in this case. The robustness of the slope heurisitics compared to AIC is shown on simulated examples and a new overpenalization of Akaike's criterion is proposed, which outperforms the criterion AICc of Hurvitch and Tsai and shows comparable results to the procedure proposed by Birgé and Rozenholc in 2006. The benefits of the derived procedure here is its theoretical background and interpretation. This work is still in process and some of the results can be found in a preprint.